Building Statistical Models in Python by Huy Hoang Nguyen Paul N Adams Stuart J Miller

Building Statistical Models in Python by Huy Hoang Nguyen Paul N Adams Stuart J Miller

Author:Huy Hoang Nguyen, Paul N Adams, Stuart J Miller
Language: eng
Format: epub
Publisher: Packt Publishing Pvt Ltd
Published: 2023-08-30T00:00:00+00:00


Statistical methods for feature selection

Statistical methods for feature selection rely on the primary tool that we have used throughout the previous chapters: statistical significance. The methods presented in this sub-section will be based on the statistical properties of the features themselves. We will cover two statistical methods for feature selection: correlation and statistical significance.

Correlation

The first statistical method we will discuss is correlation. We have discussed correlation in this chapter and in previous chapters; recall that correlation is a description of the relationship between two variables. Variables can be positively correlated, uncorrelated, or negatively correlated. In terms of feature selection, we want to remove features that are uncorrelated with the response variable. A feature that is uncorrelated with the response variable does not have a relationship with the response variable. Thus, an uncorrelated feature would not be a good predictor of the response variable.

Recall from Chapter 4, Parametric Tests, that we can use Pearson’s correlation coefficient to measure the linear correlation between two variables. In fact, we can calculate the correlation coefficient between all features and the target variable. After performing those calculations, we can construct a correlation ranking as shown in Figure 7.7.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.